Are Thesauri Useful in Cross-Language Information Retrieval?
نویسندگان
چکیده
Digital libraries relating to particular subject domains have invested a great deal of human e ort in developing metadata in the form of subject area thesauri. This e ort has emerged more recently in arti cial intelligence as ontologies or knowledge bases which organize particular subject areas. The purpose of subject area thesauri is to provide organization of the subject into logical, semantic divisions as well as to index document collections for e ective browsing and retrieval. Prior to free-text indexing (i.e. the bag-of-words approach to information retrieval), subject area thesauri provided the only point of entry (or 'entry vocabulary') to retrieve documents. A debate began over thirty years ago about the relative utility of the two approaches to retrieval:
منابع مشابه
Cross-Language Information Retrieval in a Multilingual Legal Domain
We describe here the application of a cross-language information retrieval technique based on similarity thesauri in the domain of Swiss law. We present the theory of similarity thesauri, which are information structures deerived from corpora, and show how they can be used for cross-language retrieval. We also discuss the collections of Swiss legal documents and show how we have used them to co...
متن کاملAutomatically-extracted Thesauri for Cross-language Ir: When Better Is Worse
A statistical algorithm for extracting bilingual term dictionaries (thesauri) from parallel text is presented, along with reenements for improving their size and accuracy. Somewhat paradoxically , increasing the accuracy of the extracted thesaurus can in fact reduce the performance of an IR system using it to perform query translation for cross-language information retrieval.
متن کاملSimilarity Thesauri and Cross-Language Retrieval
This paper describes a method for constructing a thesaurus automatically from a corpus of suitable documents, using standard information retrieval methods. The resulting thesauri can be used for user-initiated query expansion, automatic query expansion, as well as cross-language retrieval. Researchers at the Swiss Federal Institute of Technology in Zürich developed and evaluated this method in ...
متن کاملAutomatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval
OBJECTIVES We present in this article experiments on multi-language information extraction and access in the medical domain. For such applications, multilingual terminology plays a crucial role when working on specialized languages and specific domains. MATERIAL AND METHODS We propose firstly a method for enriching multilingual thesauri which extracts new terms from parallel corpora, and seco...
متن کاملThesaurus Mapping for Promoting Semantic Interoperability of European Public Services
Interoperability of eGovernment information systems is essential to provide advanced services to citizens. This work proposes a framework for implementing interoperability among thesauri for promoting cross-collection and cross-language information retrieval, as well as a specific approach within such framework on a case study aimed at mapping five thesauri of interest for the European Union in...
متن کامل